10 research outputs found

    Architectural Solutions for NanoMagnet Logic

    Get PDF
    The successful era of CMOS technology is coming to an end. The limit on minimum fabrication dimensions of transistors and the increasing leakage power hinder the technological scaling that has characterized the last decades. In several different ways, this problem has been addressed changing the architectures implemented in CMOS, adopting parallel processors and thus increasing the throughput at the same operating frequency. However, architectural alternatives cannot be the definitive answer to a continuous increase in performance dictated by Moore’s law. This problem must be addressed from a technological point of view. Several alternative technologies that could substitute CMOS in next years are currently under study. Among them, magnetic technologies such as NanoMagnet Logic (NML) are interesting because they do not dissipate any leakage power. More- over, magnets have memory capability, so it is possible to merge logic and memory in the same device. However, magnetic circuits, and NML in this specific research, have also some important drawbacks that need to be addressed: first, the circuit clock frequency is limited to 100 MHz, to avoid errors in data propagation; second, there is a connection between circuit layout and timing, and in particular, longer wires will have longer latency. These drawbacks are intrinsic to the technology and for this reason they cannot be avoided. The only chance is to limit their impact from an architectural point of view. The first step followed in the research path of this thesis is indeed the choice and optimization of architectures able to deal with the problems of NML. Systolic Ar- rays are identified as an ideal solution for this technology, because they are regular structures with local interconnections that limit the long latency of wires; more- over they are composed of several Processing Elements that work in parallel, thus exploit parallelization to increase throughput (limiting the impact of the low clock frequency). Through the analysis of Systolic Arrays for NML, several possible im- provements have been identified and addressed: 1) it has been defined a rigorous way to increase throughput with interleaving, providing equations that allow to esti- mate the number of operations to be interleaved and the rules to provide inputs; 2) a latency insensitive circuit has been designed, that exploits a data communication protocol between processing elements to avoid data synchronization problems. This feature has been exploited to design a latency insensitive Systolic Array that is able to execute the Floyd-Steinberg dithering algorithm. All the improvements presented in this framework apply to Systolic Arrays implemented in any technology. So, they can also be exploited to increase performance of today’s CMOS parallel circuits. This research path is presented in Chapter 3. While Systolic Arrays are an interesting solution for NML, their usage could be quite limited because they are normally application-specific. The second re- search path addresses this problem. A Reconfigurable Systolic Array is presented, that can be programmed to execute several algorithms. This architecture has been tested implementing many algorithms, including FIR and IIR filters, Discrete Cosine Transform and Matrix Multiplication. This research path is presented in Chapter 4. In common Von Neumann architectures, the logic part of the circuit and the memory one are separated. Today bus communication between logic and memory represents the bottleneck of the system. This problem is addressed presenting Logic- In-Memory (LIM), an architecture where memory elements are merged in logic ones. This research path aims at defining a real LIM architectures. This has been done in two steps. The first step is represented by an architecture composed of three layers: memory, routing and logic. In the second step instead the routing plane is no more present, and its features are inherited by the memory plane. In this solution, a pyramidal memory model is used, where memories near logic elements contain the most probably used data, and other memory layers contain the remaining data and instruction set. This circuit has been tested with odd-even sort algorithms and it has been benchmarked against GPUs and ASIC. This research path is presented in Chapter 5. MagnetoElastic NML (ME-NML) is a technological improvement of the NML principle, proposed by researchers of Politecnico di Torino, where the clock system is based on the induced stretch of a piezoelectric substrate when a voltage is ap- plied to its boundaries. The main advantage of this solution is that it consumes much less power than the classic clock implementation. This technology has not yet been investigated from an architectural point of view and considering complex circuits. In this research field, a standard methodology for the design of ME-NML circuits has been proposed. It is based on a Standard Cell Library and an enhanced VHDL model. The effectiveness of this methodology has been proved designing a Galois Field Multiplier. Moreover the serial-parallel trade-off in ME-NML has been investigated, designing three different solutions for the Multiply and Accumulate structure. This research path is presented in Chapter 6. While ME-NML is an extremely interesting technology, it needs to be combined with other faster technologies to have a real competitive system. Signal interfaces between NML and other technologies (mainly CMOS) have been rarely presented in literature. A mixed-technology multiplexer is designed and presented as the basis for a CMOS to NML interface. The reverse interface (from ME-NML to CMOS) is instead based on a sensing circuit for the Faraday effect: a change in the polarization of a magnet induces an electric field that can be used to generate an input signal for a CMOS circuit. This research path is presented in Chapter 7. The research work presented in this thesis represents a fundamental milestone in the path towards nanotechnologies. The most important achievement is the de- sign and simulation of complex circuits with NML, benchmarking this technology with real application examples. The characterization of a technology considering complex functions is a major step to be performed and that has not yet been ad- dressed in literature for NML. Indeed, only in this way it is possible to intercept in advance any weakness of NanoMagnet Logic that cannot be discovered consid- ering only small circuits. Moreover, the architectural improvements introduced in this thesis, although technology-driven, can be actually applied to any technology. We have demonstrated the advantages that can derive applying them to CMOS cir- cuits. This thesis represents therefore a major step in two directions: the first is the enhancement of NML technology; the second is a general improvement of parallel architectures and the development of the new Logic-In-Memory paradigm

    Protein Alignment Systolic Array Throughput Optimization

    Get PDF
    Protein comparison is gaining importance year after year since it has been demonstrated that biologists can find cor- relation between different species, or genetic mutations that can lead to cancer and genetic diseases. Protein sequence alignment is the most computational intensive task when performing protein comparison. In order to speed-up alignment, dedicated processors that can perform different computations in parallel have been designed. Among them, the best performance have been achieved using Systolic Arrays. However, when the Processing Elements of the Systolic Array have an internal loop, performance could be highly reduced. In this work we present an architectural strategy to address this problem applying pipeline interleaving; this strategy is applied to a Systolic Array for Smith Waterman algorithm that we designed. Results encourage the adoption of pipeline interleaving for parallel circuits with loop based Processing Elements. We demonstrate that important benefits in terms of higher operating frequency can be derived without so relevant costs as increased complexity, area and power required

    Parallel and Serial Computation in Nanomagnet Logic: An Overview

    No full text
    Nanomagnet logic (NML) is a promising technology beyond CMOS technology because it can guarantee an extremely low-power consumption. This technology is extremely different from CMOS in some peculiar aspects: 1) logic gates and wires have the same delay and 2) the layout of a circuit influences its timing characteristics. With these characteristics, it is clear that a simple mapping of CMOS register-transfer level circuits in NML would be inefficient. Circuit logic design must be adapted to this new technology. One interesting aspect is the opportunity to design bit-serial circuits instead of parallel ones and achieve comparable performance with less area occupation. In this paper, we explore the parallel and serial design in NML with magnetoelastic clock through a common case study: the multiply and accumulate algorithm. This is designed in three different versions (fully parallel, fully serial, and parallel–serial) and analyzed in terms of latency, throughput, area occupation, and circuit dissipation

    Interleaving in Systolic-Arrays: A Throughput Breakthrough

    No full text

    A reconfigurable array architecture for NML

    No full text
    NanoMagnet Logic (NML) is one of the most promising emerging technologies, in particular for its low power consumption and for the capability to mix logic and memory in the same device. At the same time this technology has some drawbacks, the most important of which is the long delay of wires and the correlation between layout and circuit timing. From a technological point of view, MagnetoElastic NML (ME-NML) is one of the proposed improvements that could address some of these drawbacks. From an architectural point of view, instead, to exploit the peculiar characteristics of NML and reduce the impact of its drawbacks, parallel solutions like Systolic Arrays can be adopted. Systolic Arrays are commonly used as hardware accelerators dedicated to a single algorithm, and for this reason their field of use has been extremely limited. Reconfigurable Arrays can overcome this limitation. In this article we first introduce our Reconfigurable Systolic Array. It can be configured to execute different algorithms and it is therefore an ideal architecture for NML. The Reconfigurable Systolic Array has been first designed at a RTL level in CMOS and synthesized using a 28nm technology. Then, it has been synthesized and simulated in classic NML using ToPoliNano, the first existing tool for NML. Finally, a custom layout based on ME-NML has been designed and we have estimated area and power dissipation. Comparison among the technologies show that ME-NML is extremely promising in terms of area occupation and power dissipation. Even if the technology is not yet mature it can already compete with CMOS

    A Framework for Network-On-Chip Comparison Based on OpenSPARC T2 Processor

    No full text
    Network-on-Chip is gaining interest in these years thanks to its regular and scalable design. Several topologies have been proposed, and there is the need of a general framework for their test, validation and comparison. In this article a framework based on the OpenSPARC T2 processor is presented, where the NoC is used to replace the Cache Crossbar. With the introduction of protocol translators, it is possible to accomodate any NoC inside the T2. Processor regression tests can be used to validate the design and evaluate timing performance

    Possible distributions of the number of synapses of a single connection resulting from the interaction of synaptic and structural plasticity with different neuron models.

    Get PDF
    <p><i>(A)</i> Three different curvatures of input-output functions <i>F</i> of the neuron (black) lead to different shapes (curvatures) of the combinatorial term <i>p</i><sub><i>cf</i></sub> (red, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004031#pcbi.1004031.e019" target="_blank">Eq. 4</a>). For fixed presynaptic activity and postsynaptic stimulation, the lines are calculated for continuous values of <i>S</i>, whereas the dots mark successive discrete values. <i>(B)</i> When the combinatorial influences <i>p</i><sub><i>cf</i></sub> (red, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004031#pcbi.1004031.e019" target="_blank">Eq. 4</a>) are smaller than the logarithmic deletion probability <i>p</i><sub><i>d</i></sub> (black) for a certain value of <i>S</i> (grey shaded area), the long-term equilibrium probability for <i>S</i> synapses is higher than the probability for <i>S</i> − 1 synapses (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004031#pcbi.1004031.e010" target="_blank">Eq. 2</a>) and vice versa. Thus, intersections of both terms indicate peaks and valleys of the probability distribution <i>p</i>[<i>S</i>]. To cover all six possible intersection structures between <i>p</i><sub><i>cf</i></sub> and <i>p</i><sub><i>d</i></sub>, we show example snippets for the <i>p</i><sub><i>d</i></sub> with a variety of curvatures and slopes. <i>(C)</i> The shape of the long-term equilibrium probability distributions (schematically) for the number of synapses of the plastic connection can be derived from the intersection structures in <i>(B)</i>: each intersection in <i>(B)</i> leads to a local extremum in the probability distribution in <i>(C)</i>. Furthermore there can be peaks at the boundaries. Note, experimental connectivity (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004031#pcbi.1004031.g001" target="_blank">Fig. 1A</a>) corresponds to case six which has two intersections. As <i>p</i><sub><i>cf</i></sub> is monotonically growing, two intersections are only possible for growing <i>p</i><sub><i>d</i></sub>-functions.</p

    Reconfigurable Systolic Array: From Architecture to Physical Design for NML

    Get PDF
    NanoMagnet logic (NML) is among the emerging technologies that might replace CMOS in the next decades. According to its physical characteristics, to better exploit the potential of this technology-and of other similar ones--the use of parallel architectures with regular layout that avoid long interconnection signals is advised. Systolic arrays (SAs) are among these architectures, being composed of a grid of equal processing elements that are locally interconnected. However, they are usually implemented to execute only a small set of algorithms, and for this reason, throughout the years, they have not been an appealing solution for CMOS. To seriously analyze the potentials of NML, complex architectures must be conceived, and their physical implementation explored considering realistic technological constraints. With the increasing complexity of NML circuits, two issues, then, are noticed: 1) the need for a regular structure arises, that at the same time helps to reduce the intrinsic pipelining nature of NML and can be configured to be used for several applications without developing a dedicated design for each algorithm and 2) the capability to synthesize, place and route NML circuits is fundamental to demonstrate the feasibility of the architecture in two important conditions: efficiently managing the complexity of the design and sticking to the characteristics that are technologically feasible at the time of writing. In this paper, we address these issues presenting a new reconfigurable SA that can be programmed to execute different algorithms, and we provide two examples to show its working principle. Moreover, the array is synthesized and simulated with the aid of the first real tool for nanotechnology circuits that we have conceived, Torino Politecnico Nanotechnology tool. The joint contribution at both the architectural and physical design levels gives a relevant step forward to the state of the art in the demonstration of this emerging technology potential

    Logic-in-Memory: A Nano Magnet Logic Implementation

    No full text
    In most computational systems memory access rep- resents a relevant bottleneck for circuits performance. The execution speed of algorithms is severely limited by memory access time. An emerging technology like NanoMagnet Logic (NML), where its magnetic nature leads to an intrinsic memory ability, represents therefore a very promising opportunity to solve this issue. NanoMagnet Logic is the ideal candidate to implement the so called Logic-In-Memory (LIM) architecture. But how is it possible to organize an architecture where logic and memory are mixed and not separated entities? In this paper we try to address this issue presenting our recent developments on LIM architectures. We originally conceived a LIM architecture without considering any technological con- straints. Here we present the first adaptation of that architecture to NanoMagnet Logic technology. The architecture is based on an array of identical cells developed on three virtual layers, one for logic, one for memory and one for information routing. These three virtual layers are mapped on two physical layers exploiting all our recent improvements on NanoMagnet Logic technology, which are validated with the help of low level simulations. The structure has been tested implementing two different algorithms, a sort algorithm and an image manipulation algorithm. A complete characterization in terms of area and power is reported. The structure here presented is therefore the first step of an ongoing effort directed toward the development of truly innovative architecture
    corecore